Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.
translated by 谷歌翻译
In this paper, we investigate the joint device activity and data detection in massive machine-type communications (mMTC) with a one-phase non-coherent scheme, where data bits are embedded in the pilot sequences and the base station simultaneously detects active devices and their embedded data bits without explicit channel estimation. Due to the correlated sparsity pattern introduced by the non-coherent transmission scheme, the traditional approximate message passing (AMP) algorithm cannot achieve satisfactory performance. Therefore, we propose a deep learning (DL) modified AMP network (DL-mAMPnet) that enhances the detection performance by effectively exploiting the pilot activity correlation. The DL-mAMPnet is constructed by unfolding the AMP algorithm into a feedforward neural network, which combines the principled mathematical model of the AMP algorithm with the powerful learning capability, thereby benefiting from the advantages of both techniques. Trainable parameters are introduced in the DL-mAMPnet to approximate the correlated sparsity pattern and the large-scale fading coefficient. Moreover, a refinement module is designed to further advance the performance by utilizing the spatial feature caused by the correlated sparsity pattern. Simulation results demonstrate that the proposed DL-mAMPnet can significantly outperform traditional algorithms in terms of the symbol error rate performance.
translated by 谷歌翻译
Neural network language model (NNLM) plays an essential role in automatic speech recognition (ASR) systems, especially in adaptation tasks when text-only data is available. In practice, an NNLM is typically trained on a combination of data sampled from multiple corpora. Thus, the data sampling strategy is important to the adaptation performance. Most existing works focus on designing static sampling strategies. However, each corpus may show varying impacts at different NNLM training stages. In this paper, we introduce a novel adaptive multi-corpora training algorithm that dynamically learns and adjusts the sampling probability of each corpus along the training process. The algorithm is robust to corpora sizes and domain relevance. Compared with static sampling strategy baselines, the proposed approach yields remarkable improvement by achieving up to relative 7% and 9% word error rate (WER) reductions on in-domain and out-of-domain adaptation tasks, respectively.
translated by 谷歌翻译
Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
最近,视觉变压器及其变体在人类和多视图人类姿势估计中均起着越来越重要的作用。将图像补丁视为令牌,变形金刚可以对整个图像中的全局依赖项进行建模或其他视图中的图像。但是,全球关注在计算上是昂贵的。结果,很难将这些基于变压器的方法扩展到高分辨率特征和许多视图。在本文中,我们提出了代币螺旋的姿势变压器(PPT)进行2D人姿势估计,该姿势估计可以找到粗糙的人掩模,并且只能在选定的令牌内进行自我注意。此外,我们将PPT扩展到多视图人类姿势估计。我们建立在PPT的基础上,提出了一种新的跨视图融合策略,称为人类区域融合,该策略将所有人类前景像素视为相应的候选者。可可和MPII的实验结果表明,我们的PPT可以在减少计算的同时匹配以前的姿势变压器方法的准确性。此外,对人类360万和滑雪姿势的实验表明,我们的多视图PPT可以有效地从多个视图中融合线索并获得新的最新结果。
translated by 谷歌翻译
步态计划是一种通常应用于地面机器人的过程,例如四足机器人; Tilt-Rotor是一种新型的四型四个输入,不是其中之一。在控制倾斜 - 依赖反馈线性化的倾斜旋转时,预计倾斜角度(输入)将过度改变,这在应用程序中可能不会预期。为了帮助抑制倾斜角度的密集变化,在反馈线性化之前,将步态计划程序引入倾斜度。用户提前时间指定倾斜角度,而不是由控制规则给出。但是,基于这种情况,反馈线性化中的去耦矩阵对于某些态度,滚动角度和螺距角的组合可能是单数的。它阻碍了反馈线性化的进一步应用。因此,建立了两个彩色图定理,以最大程度地提高可接受的态度区域,在该区域中,滚动和音高的组合将产生可逆的去耦矩阵。然而,该定理过度限制了倾斜角度的选择,这可以排除一些可行的健壮步态。本文给出了广义的两个彩色图定理。所有健壮的步态都可以根据这种广义定理找到。分析了满足该广义的两个彩色图定理(违反两个彩色图定理)的三个步态的鲁棒性。结果表明,概括的两个颜色图定理完成了对倾斜旋转的稳健步态的搜索。
translated by 谷歌翻译
基于深卷积神经网络(CNN)的面部识别表现出归因于提取的高判别特征的卓越精度性能。然而,经常忽略了深度学习模型(深度特征)提取的功能的安全性和隐私。本文提出了从深度功能中重建面部图像,而无需访问CNN网络配置作为约束优化问题。这种优化可最大程度地减少从原始面部图像中提取的特征与重建的面部图像之间的距离。我们没有直接解决图像空间中的优化问题,而是通过寻找GAN发电机的潜在向量来重新重新制定问题,然后使用它来生成面部图像。 GAN发电机在这个新颖的框架中起着双重作用,即优化目标和面部发电机的面部分布约束。除了新颖的优化任务之外,我们还提出了一条攻击管道,以基于生成的面部图像模拟目标用户。我们的结果表明,生成的面部图像可以达到最先进的攻击率在LFW上的最先进的攻击率在I型攻击下为0.1 \%。我们的工作阐明了生物识别部署,以符合隐私和安全政策。
translated by 谷歌翻译
空中接入网络已被识别为各种事物互联网(物联网)服务和应用程序的重要驾驶员。特别是,以无人机互联网为中心的空中计算网络基础设施已经掀起了自动图像识别的新革命。这种新兴技术依赖于共享地面真理标记的无人机(UAV)群之间的数据,以培训高质量的自动图像识别模型。但是,这种方法将带来数据隐私和数据可用性挑战。为了解决这些问题,我们首先向一个半监督的联邦学习(SSFL)框架提供隐私保留的UAV图像识别。具体而言,我们提出了模型参数混合策略,以改善两个现实场景下的FL和半监督学习方法的天真组合(标签 - 客户端和标签 - 服务器),其被称为联合混合(FEDMIX)。此外,在不同环境中使用不同的相机模块,在不同环境中使用不同的相机模块,在不同的相机模块,即统计异质性,存在显着差异。为了减轻统计异质性问题,我们提出了基于客户参与训练的频率的聚合规则,即FedFReq聚合规则,可以根据其频率调整相应的本地模型的权重。数值结果表明,我们提出的方法的性能明显优于当前基线的性能,并且对不同的非IID等级的客户数据具有强大。
translated by 谷歌翻译
最近,立体声匹配基准的记录由端到端视差网络不断破碎。但是,这些深层模型的域适应能力非常有限。解决此类问题,我们提出了一种名为ADASTEREO的新型域自适应方法,该方法旨在对准深度立体声匹配网络的多级表示。与以前的方法相比,我们的ADASTEREO实现了更标准,完整有效的域适应管道。首先,我们提出了一种用于输入图像级对准的非对抗渐进颜色传输算法。其次,我们设计一个有效的无参数成本归一化层,用于内部特征级别对齐。最后,提出了一种高效的辅助任务,自我监督的遮挡感知重建以缩小输出空间中的间隙。我们进行密集的消融研究和分解比较,以验证每个提出的模块的有效性。没有额外推断开销,只有略微增加训练复杂性,我们的Adastereo模型在多个基准上实现了最先进的跨领域性能,包括Kitti,Middrbury,Eth3D和驾驶员,甚至优于一些状态 - 与目标域的地面真相Fineetuned的差异网络。此外,基于两个额外的评估指标,从更多的观点进一步揭示了我们域 - 自适应立体声匹配管道的优越性。最后,我们证明我们的方法对各种域适配设置具有强大,并且可以轻松地集成到快速适应应用方案和现实世界部署中。
translated by 谷歌翻译
估计每个视图中的2D人类姿势通常是校准多视图3D姿势估计的第一步。但是,2D姿势探测器的性能遭受挑战性的情况,例如闭塞和斜视角。为了解决这些挑战,以前的作品从eMipolar几何中的不同视图之间导出点对点对应关系,并利用对应关系来合并预测热插拔或特征表示。除了后预测合并/校准之外,我们引入了用于多视图3D姿势估计的变压器框架,其目的地通过将来自不同视图的信息集成信息来直接改善单个2D预测器。灵感来自先前的多模态变压器,我们设计一个统一的变压器体系结构,命名为输送,从当前视图和邻近视图中保险。此外,我们提出了eMipolar字段的概念来将3D位置信息编码到变压器模型中。由Epipolar字段引导的3D位置编码提供了一种有效的方式来编码不同视图的像素之间的对应关系。人类3.6M和滑雪姿势的实验表明,与其他融合方法相比,我们的方法更有效,并且具有一致的改进。具体而言,我们在256 x 256分辨率上只有5米参数达到人类3.6米的25.8毫米MPJPE。
translated by 谷歌翻译